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Abstract — In the setting of quasi-static multiple-input multiple- 
output (MIMO) channels, we consider the high signal-to-noise ra- 
tio (SNR) asymptotic complexity required by the sphere decoding 
(SD) algorithm for decoding a large class of full rate linear space- 
time codes. With SD complexity having random fluctuations 
induced by the random channel, noise and codeword realizations, 
the introduced SD complexity exponent manages to concisely 
describe the computational reserves required by the SD algorithm 
to achieve arbitrarily close to optimal decoding performance. 
Bounds and exact expressions for the SD complexity exponent 
are obtained for the decoding of large families of codes with 
arbitrary performance characteristics. For the particular exam- 
ple of decoding the recently introduced threaded cyclic division 
algebra (CDA) based codes - the only currently known explicit 
designs that are uniformly optimal with respect to the diversity 
multiplexing tradeoff (DMT) - the SD complexity exponent is 
shown to take a particularly concise form as a non-monotonic 
function of the multiplexing gain. To date, the SD complexity 
exponent also describes the minimum known complexity of any 
decoder that can provably achieve a gap to maximum likelihood 
(ML) performance which vanishes in the high SNR limit. 

Index Terms — Diversity-Multiplexing Tradeoff, Sphere Decod- 
ing, Complexity, Space-Time Codes, Large Deviations. 



I. Introduction 

The past decade has seen the abundant use of the sphere 
decoding (SD) algorithm [Tl-p4j as a tool for facilitating 
near maximum likelihood (ML) decoding over the coherent 
delay-limited (or quasi-static) multiple-input multiple-output 
(MIMO) channel. The SD algorithm allows for efficient opti- 
mal or near optimal decoding of a large number of high rate 
space-time codes that map constituent constellation symbols 
linearly in space and time LIJ- As the algorithm's compu- 
tational cost depends on the fading channel, it is generally 
known that in implementing the SD algorithm, one can 
tradeoff computational complexity for error performance by 
selectively choosing when to decode and when not to. Equiv- 
alently, in the presence of constraints on the computational 
reserves that may be allocated to decoding, the algorithm is 
faced with the prospect of encountering channel realizations 
that force it to violate its run-time constraints, thus having 
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to declare decoding outages that inevitably reduce reliability. 
This naturally raises the intriguing question of how large 
computational reserves are actually required for near ML 
performance. 

While this question is hard to answer in general, or even ask 
in a rigorously meaningful way, we show herein that by fol- 
lowing ||5] and considering the decoding of sequences of codes 
in the high signal-to-noise ratio (SNR) limit, not only can the 
question be made rigorous: It also admits surprisingly simple 
explicit and general answers. Drawing from the diversity 
multiplexing tradeoff (DMT) setting which has akeady been 
successfully applied to concisely describe the high SNR diver- 
sity exponent in the reliability analysis of reduced complexity 
decoders f^l-fSl, we introduce the SD complexity exponent 
as a measure of complexity of the SD algorithm. The SD 
complexity exponent characterizes the decoding complexity 
in the high SNR limit under the assumption that the code-rate 
scales with SNR in order to provide a given multiplexing gain. 
This approach naturally takes into account the dependency of 
the SD complexity on the codeword density and the codebook 
size, as well as the SNR and the fading characteristics of 
the wireless channel. Similar to previous work on the DMT 
relating the code rate and probability of decoding error, it is 
seen that also the complexity, although hard to characterize at 
any finite SNR, has mathematically tractable characterizations 
in the high SNR asymptote. These characterizations in turn 
yield valuable insights into the behavior of the algorithm. 

The SD algorithm is equivalent to a branch-and-bound 
search f4l over a regular tree and like most other works on SD 
complexity [2], Q, l|9]|- lfTn we view the number of visited 
nodes N as the complexity of the algor ithnfl To identify 
an appropriate scale of interest for complexity at high SNR, 
it is useful to note that in order to achieve a multiplexing 
gain of r the code must in the high SNR limit have rate ofl 
R ~ r log p + o(log p) bits per channel use where p denotes 
the SNR. Consequently, the cardinality of the codebook is 
\X\ ^ p"^^ where T is the codeword length and where = 
denotes equality in the SNR exponent (cf. fSl and Section 
II-BI) . In the worst case, as will be shown later, the sphere 
decoder is in essence forced to perform a complete search 
over the entire codebook, and its complexity is in this case 
p"^^ . However, although feasible, this event is also highly 
improbable. Thus, in order to quantify the probability that the 
sphere decoder visits a certain (large) number of nodes, we 

'in the context of the DMT this is, as we argue later on, equivalent to 
measuring complexity in floating point operations (flops). 

^Herein, log denotes the base-2 logaiithm and o(-) is the standard Landau 
notation where /(p) = o[<f>{p)) implies limp_>oo fip)/<l>{p) = 0. Similarly, 
f{p) = 0(<t>{p)) implies that limsupp,^^ \ f {p)\/<l>{p) < °o for 0(p) > 0. 
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introduce a (complexity) rate-function '^{x) over < x < rT 
given implicitly by P (N > p^) = where N is the 

complexity of the SD algorithm. In short, captures the 

decay-rate of the probability that the complexity exceeds a 
given SNR dependent threshold p^, or equivalently, that the 
algorithm visits a specific sizable subset of the codebook. This 
decay-rate should be contrasted with the minimum probability 
of decoding error, which vanishes in the high SNR limit as 
p^'^('^) where d{r) is the diversity gain of the code under 
maximum likelihood (ML) decoding. We can thus judiciously 
argue that for any x such that > d{r), the probability 

that the complexity exceeds p^ is at high SNR insignificant in 
comparison to the overall probability of error of the decoder. 
In other words: For x such that '^{x) > d{r), imposing a 
run-time limit of p^ on the complexity of the algorithm - and 
declaring a decoding outage whenever this limit is not met - 
would cause a vanishing degradation in terms of the overall 
error probability at high SNR. This motivates us to deviate 
from the traditional worst-case complexity measure that fails to 
meaningfully describe the effective complexity, and to define 
the SD complexity exponent c(r) as the infimum of all x for 
which > d{r). In essence, p'^''"^ represents the minimum 
computational reserves required for achieving DMT optimal 
performance using the SD algorithm. Precise definitions of 
c(r), and a rigorous treatment of the notion of a vanishing 
degradation in the overall error probability, is given in Section 
Illl-Cl and by Theorem [T] The main topic of this work will then 
be to give closed form expressions, and bounds, for the SD 
complexity exponent c{r) when decoding different classes of 
full rate linear codes, to be described later, including the codes 
proposed in lfT2l - lfr7l . 

Most other works on sphere decoding complexity consider 
uncoded (spatially multiplexed) systems and asymptotic results 
in terms of the signal space dimension, see, e.g., (|9], ifTOl . 
ifTsl . ifTgl . Our work is instead more related to the analysis 
in ifm . which considers the complexity tail distribution for a 
fixed signal space dimension. However, unlike 11 11 we also 
incorporate the space-time codes into the analysis, as well 
as the SNR scalings of the rates of these codes mandated 
by the DMT. In parallel with our work the work in ll20l 
provides an analysis of the complexity tail distribution for 
unconstrained lattice sequential decoders, in the presence of 
DMT optimal random lattice codes. A fundamental difference 
with our work and |20| is that \20\ considers unconstrained 
lattice decoding, whereas we explicitly take into account the 
constellation boundary in the decoder Another difference is 
that the lattice codes considered in our work can be explicitly 
constructed and can have arbitrary DMT performance, unlike 
the random codes in ||20| which are non-explicit and which are 
restricted to being DMT optimal. We also take our analysis 
one step further by coupling the complexity tail distribution to 
the DMT performance of the code in order to obtain the SD 
complexity exponent. Regarding the ultimate complexity limits 
on DMT optimal decoding, we have previously established that 
lattice reduction (LR)-aided linear decoders are sufficient for 
achieving the entire DMT tradeoff at a worst-case complexity 
of 0{\og{p)), i.e., corresponding to a complexity exponent of 
c(r) = iH. This is lower than the SD complexity exponent 



that we will present in what follows. However one notable 
difference is that the statements made herein are fundamentally 
stronger in terms of error probability as they not only imply 
full diversity but also a vanishing SNR gap to the ML decoder 
(see Theorem [T] for details). Such a result was not established 
for the decoders in [8J, 1,20 J . 

A. Outline and contributions 

The general definition of the SD complexity exponent is 
given in Definition [T] and Theorem [T| then describes how 
sphere decoding and the time-out policies to be employed 
can guarantee a gap to ML that vanishes with increasing 
SNR. However, before proceeding with the statement of these 
results, we first consider the code-channel system, describe 
the basic workings of the SD algorithm, and handle different 
pertinent aspects that are necessary for the exposition that 
follows. 

Following the definition of the SD complexity exponent. 
Theorem |2] gives, in the form of an optimization problem, 
a general upper bound c(r) on the SD complexity exponent 
c(r) when decoding any full rate code with multiplexing 
gain r and diversity d{r). An explicit closed form expression 
for c(r) is then given in Theorem |3] for all DMT optimal 
full rate codes. The bound c(r) is already useful in itself in 
that it establishes that the SD complexity exponent is much 
lower than the worst-case SNR exponent rT associated with 
a full search of the codebook. However, in the interest of also 
establishing the tightness of this bound. Lemma |2] provides 
easy-to-check sufficient conditions on the generator matrix of 
the code lattice, that guarantee the tightness of c(r) in the 
most general setting. Building on this. Theorem |4] establishes 
that, given any fuU rate design of arbitrary DMT performance, 
there is always at least one non-random SD column ordering 
lEl, in for which c(r) — c{r), i.e., for which the exact c(r) 
can be explicitly calculated from the result of Theorem |2] 
Theorem |5] goes one step further and establishes the exact SD 
complexity exponent, given any threaded code design and the 
natural column ordering, to be c(r) — c{r) while Theorem |6] 
provides an explicit expression for c(r) for any DMT optimal 
threaded code design. Surprisingly, this simple expression (see 
Fig. [U can also serve as an upper bound on c(r) for any 
full-rate code, irrespective of the fading statistics. Finally, and 
along a different path, Theorem|7]establishes c(r) for any 2x2 
approximately universal code lISTj . irrespective of its specific 
structure, thus identifying the exact c(r) even for possibly 
undiscovered code structural designs, as long as these designs 
are approximately universal, i.e., as long as they achieve DMT 
optimality for all fading statistics. Some general discussions 
of these results are then provided in Section |V] 

The SD complexity exponent for decoding the class of full 
rate threaded DMT optimal codes with minimum delay, i.e., 
for which jit = T ~ n where denotes the number of 
transmit antennas, is showiJl in Fig. [T| for n = 2, . . . , 6. 
The results shown in the figure apply to codes such as those 
presented in lfT2l -|fT7|. Before proving the aforementioned 

closed form expression for the complexity exponent c(r) shown in 
Fig. [T] is given by i50\ in Theorem |6] 
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Multiplexing gain r 

Fig. 1. The SD complexity exponent c(r) for decoding threaded minimum 
delay DMT optimal codes with tit = T = n for n = 2, . . . , 6. The SD 
complexity exponent is illustrated by the bold lines. The same exponent also 
serves as an upper bound to the SD complexity exponent when decoding 
any minimum delay DMT optimal full rate linear dispersive code. The thin 
lines show the quadratic function given by r(n — r) which provides the exact 
complexity exponent at integer multiplexing gains. 

results, it is worth commenting on the somewhat counter- 
intuitive result suggested by the SD complexity exponent 
in Fig. [U Namely, that while c(r) initially increases as a 
function of the multiplexing gain r, it then decreases as r 
approaches its maximum value ut- The initial increase can 
easily be explained by the fact that the cardinality (and density) 
of the codebook X increases as a function of r. However, 
the decrease at high multiplexing gains can be understood 
in light of the coupling of the complexity with the overall 
probability of error: In short, at high multiplexing gains the 
error probability is also higher and this implies that the decoder 
may time out for a larger set of problem instances without 
seriously affecting the overall performance, leading to an 
overall reduction in decoding complexity. Pushing the code- 
decoder pair towards the maximal data rate does therefore not 
imply that the decoding complexity is maximized. This effect 
is discussed further in Section IV-AI in terms of information 
theoretic outages. 

B. Notation 

We let Z, M, and C, denote the set of integer, real and 
complex numbers respectively and F" and F™^" the set of n- 
vectors and m x n-matrices over F e {Z,IR,C}. Vectors are 
denoted by lower-case bold letters a, and matrices are denoted 
by upper-case bold letters A. We use {■)'^ and (•)^ to denote 
the transpose and Hermitian (conjugate) transpose of vectors 
and matrices, and vec(-) : C™^" i-> C™" to denote the matrix 
to vector operation whereby the columns of the argument are 
stacked on top of each other We use 7„ e C" to denote the 
n X n identity matrix, and use to denote the zero vector 
or matrix where the dimensions are given by the context in 
which is used. Deviating slightly from standard usage, we 
refer to a tall matrix U G ([^mxn y^jjgj-g ?7j > n as unitary 
if U^U = I. For a e C we use 3fJ(a),5(a) to respectively 
denote the real and imaginary parts of a. For a e M we use 



[aj to denote the floor operation defined as the largest integer 
smaller than or equal to a, \a\ to denote the ceil operation 
defined as the smallest integer larger than or equal to a, and 
we let (a)+ = max(a, 0). 

We let <7i{A) < . . . < <7n{A) denote the ordered (struc- 
turally) non-zero singular values [22] of a matrix A e C™^" 
where m> n. We will on occasion use (Tinax(^) to denote the 
largest singular value when the dimension of A is not explicit. 
We make no notational difference between random variables 
and their realizations. We will use the notation il={ - • • } to 
label the stochastic event within the brackets. 

Finally, in order to simplify notation we will make use of the 
= (and <, >, <, >) notation for equalities (and inequalities) 
in the SNR exponent, cf. [SJ. Specifically, we write /(p) < p° 
and g{p) > p'' to denote 

1- log/(p) , A r ■ ( log5(p) ^ , 
lim sup — ; < a and lim mi — > o 

p^oo logp P^oo log/9 

and f{p) ^ when f{p) < p^ and f{p) > p^ simultaneously 
hold. The definition of < and > follows after replacing < by 
< and > by > in the above expressions. 

II. Channel model and space-time codes 

We consider the standard block Rayleigh fading rix x 
riR quasi-static point-to-point MIMO channel model with 
coherence-time T given by 

Y = HX + W (1) 

where X e C"^^^, where Y e C"«^^, and where W e 
C""^ X T (jenote the transmitted space-time block codeword, the 
block of received signals, and additive spatially and temporally 
white Gaussian noise. The channel gains H E C"RxnT 
assumed to be i.i.d. circularly symmetric complex Gaussiar0 
(i.e., Rayleigh fading) and constant over the duration of 
the transmission (i.e., quasi-static Rayleigh fading). We shall 
assume throughout that riR > jit- The transmitted codewords 
X are assumed to be drawn uniformly from a codebook X 
and we assume that 

E{||X|||} = ^ E 11^111 = (2) 

so that the parameter p takes on the interpretation of an average 
SNR. Note also here that one use of ([T]l is viewed as T uses 
of the wireless channel in the definition of the data-rate. 

We shall herein restrict our attention to full rate (complex) 
linear dispersive codes ||231 . ||24 "I of the form 

X^bY^s.D, (3) 

1=1 

where e § C C are constellation symbols drawn from a 
finite alphabet §, where {Di]^^^ is a set of linearly indepen- 
dent dispersion matrices, and where 6* is a parameter regulating 
the transmit power The notion of full rate implies that each 
codeword X G ^^mtxt c^j^gg — titT constellation 

^This assumption is relaxed in Section IV-B I where the extension to arbitrary 
fading distributions is discussed. 
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symbols. We will further make the additional assumption that 
S belongs in the class of QAM-like alphabets of the form 



§,^ = {s|5R(s),5(s) e Zn [-r/,?7]} 



(4) 



where 77 > is a parameter regulating the size of the 
constellatioijfl We will use Soo to denote the extended (infinite) 
constellation obtained by letting rj = 00 in (|4]i, and note 
that §00 is nothing but the set of Gaussian integers. QAM 
constellations will in general also include a translation and 
scaling of the underlying lattice §00 ■ However, as including 
such a translation would not affect the results obtained herein, 
and as the scaling can easily be included in the dispersion 
matrices Di or 9, we omit these variations and concentrate on 
dUi in the interest of notational simplicity. 

The channel model in ([U may also be equivalently ex- 
pressed in a vectorized form according to 



H)x 



(5) 



where y = vec(Y'), where x = vec{X), where w — vec(VF), 
and where (g) denotes the Kronecker product ||25l . We shall 
mainly work with (|5]l rather than ([T) directly. In the vectorized 
form the codewords x are given by 

X = OGs 

for s e SJ^, and where the full rank matrix 



G = [vcc{Di] 



vec 



(6) 



is referred to as the generator matrix of the code. The linear 
dispersive codes form a subset of the lattice codes ll24l as the 
codewords constitute a subset of the (complex) lattice dGE>oo- 
The parameters 9 and 77 are, as noted, chosen in order to 
satisfy given transmit power and rate constraints. In particular, 
in order to ensure a multiplexing gain of 

A 1 log 1^1 

r= lim , (7) 

p^oo 1 log p 

or equivalently a rate of i? = rlogp + o(logp), it must 
hold that rj = which by ^ and (|4|l implies that 

9^ = The code structure described above includes the 

codes proposed in L12|-|15|, |17|, as well as the QAM-based 
codes of lfT6l . as special cases. Finally, we will throughout, 
with slight abuse of terminology but still in line with [T2l- 
ifTTj . use the term code when referring to the whole family 
of codes that is generated by a single generator matrix G 
for different multiplexing gains and SNRs, and trust that no 
confusion should follow by this usage. 

III. Decoding 
The coherent ML decoder for ([T) is well known to be 



XML = arg min \\Y-HX\\i,. 
xex 



(8) 



^The assumption of a square constellation is made here for simplicity 
of exposition and in line with practical encoding schemes. This assumption 
though can readily be relaxed without affecting the presented results, as long 
as the constellation is the same for all s^, i = 1, ■ ■ ■ ,k. A detailed exposition 
of the mathematical machinery that allows for this relaxation can be found in 
(U Section 111] 



The resulting diversity gain of the code, under ML decoding, 
is correspondingly given by (cf. ^) 



d{r) — — lim 

p— >oo 



logP(XMLy^X) 

\ogp 



(9) 



where the notation d{r) accentuates the dependence of the 
diversity on the multiplexing gain r. 

One of the main features of the linear dispersive codes, as 
was noted in the introduction, is that their lattice structure 
allows for efficient - optimal and near optimal - solutions to 
^ using the sphere decoding algorithm. Using the linearity 
of the map from s to a; = vec(X) we obtain (cf. ^) 

y = Ms + w (10) 

where the code-channel generator matrix M is given by 

M^9{It<E> H)G e C''-^'"' . (11) 

We can thus, instead of solving dHJ directly, equivalently obtain 
an estimate of s through 



sml = argmin \\y - Ms\ 



(12) 



where (fTzl l is an optimization problem suitable for the sphere 
decoder, and then easily recover Xml from sml- 

A. The Sphere Decoder 

The sphere decoding algorithm solves (fT2l l by a branch- 
and-bound search on a regular tree. Detailed descriptions of 
the algorithm are found in [1] and the semi-tutorial papers 
||2l-l4J, and most implementation issues will not be repeated 
herein. However, in order to make our results precise and to 
introduce notation we need to review some or the key ideas 
as they apply to (fT2] |. 

To this end, note that by the rotational invariance of the 
Euclidean norm it follows that (fT2] i is equivalent to 



arg mm \\r - 



Rs\ 



(13) 



where QR — M is the thin QR-decomposition of M (i.e., 
Q £ C'lnTxK is unitary and Re C^x" is upper triangular) 
and where r = Q^y. The sphere decoder solves (fTsl i by 
enumerating symbol vectors s £ §^ within a given sphere of 
radius f > 0, i.e., s that satisfy 



Rsf < 



(14) 



If (fT4l) is satisfied for at least one s £ S'^, then also the 
ML solution must satisfy (fT4l i as the ML solution yields the 
minimum metric in ( fT2] l. The set of vectors that satisfy (fl4] l 
is found by recursively considering partial symbol vectors 
Sk £ for k — 1,...,K. Specifically, if Sk is the vector 
containing the last k components of s, a necessary condition 
for (fl4l l to be satisfied is given by 



\rk-RkSkr<^^ 



(15) 



where £ <C^ denotes the last k components of r, and 



where Rk £ 



•<kxk 



denotes the k x k lower right corner of 



R. This follows due to the upper triangularity of R. Any set 
of vectors s £ SJJ with common last k components that fail 
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to satisfy ( fTST i may be excluded from the set of ML candidate 
vectors. Enumerating all partial symbol vectors that satisfy 
( fTSl l, beginning with fc = 1, extending these to fc — 2 and so 
on, yields a recursive procedure for enumerating all s e 
that satisfy ( fT4b . 

The enumeration of partial symbol vectors Sk is equivalent 
to the traversal of a regular tree with k layers - one per symbol 
Sfc where Sk is the kth component of s - and |§,, | children 
per node |4|. There is a one-to-one correspondence between 
the nodes at layer k (the layers are enumerated with the root 
node corresponding to fc = 0) and the partial vectors s^. We 
say that a node is visited by the sphere decoder if and only if 
the corresponding partial vector Sk satisfies ( fTSl l. i.e., there is 
a bijection between the visited nodes at layer fc and the set 

Afk^ihe S,'; I llrfe - RkHW^ <e}- (16) 

Due to this relation we will in what follows not make the 
distinction between nodes and partial symbol vectors and 
simply refer to Sk as nodes at layer fc when discussing the 
search. The total number of visited nodes (in all layers of the 
tree) is given by 

N = J2Nk, (17) 

k=l 

where Nk = \Mk \ is the number of visited nodes at layer fc of 
the search tree. The total number of visited nodes is commonly 
taken as a measure of the sphere decoder complexity (see 
El, S, S-imi) and this will also be done in what follows. 
Note however that as the total number of flops required for 
evaluating the bound in dTSI ) may be upper and lower bounded 
by constants that are independent of p lITSl our results relating 
to the SD complexity exponent would not change if we instead 
considered N to be the number of flops spent by the decoder 

B. The search radius 

The description of the sphere decoder is not complete 
without specifying how the search radius is selected. In the 
interest of obtaining the SD complexity exponent, we may 
argue that any reasonable choice of a fixed (non-random) 
search radius should satisfy 

e-p°. (18) 

To see this, it is sufficient to note that the metric in (fTJI i 
satisfies 

\\r-Rsf = WQ^wf 

for the transmitted vector s. Thus, if > the trans- 

mitted symbol vector is excluded from the search, resulting in 
a decoding error By considering a radius that grows slowly 
with SNR, say ^'^ ^ z log p = p^, it can be shown that 

n\\Q'^M\^>e)^p-\ (19) 

for z > 0, i.e., by selecting z > d{r) the probability 
of excluding the transmitted vector will (for increasing p) 
vanish faster than the probability of error and cause vanishing 
degradation of the overall probability of error At the same 
time, if the radius does not tend to infinity with increasing p. 



it will follow that P(||Q^u;||^ > is bounded away from 
zero. This implies a non-vanishing probability of error and a 
resulting diversity gain of zero, which is clearly undesirable. 
Thus, as the complexity exponent is not affected by the 
particular choice of z, we shall unless otherwise stated in 
the following for simplicity assume that ^"^ — z log p = p^, 
with z > d{r) in order to ensure vanishing degradation to 
the overall probability of error This said, the derived SD 
complexity exponent would be the same if we considered 
adaptive radius updates as used in the Schnorr-Euchner (SE) 
implementation f2l, f3l. This may be shown by following the 
argument in [10|, and we give a proof of this statement in the 
present setting in Appendix IB-DI 

C. Decoding Complexity 

The sphere decoder complexity, or equivalently the number 
of visited nodes N, is as stated a random variable with a 
distribution that depends on a number of parameters, e.g., the 
system dimensions riR, tt-t and T, the SNR p, the multiplexing 
gain r, the generator matrix G, and the search radius ^. This 
is well known and follows by the randomness of the bound 
in ( fTsl l. Naturally this randomness must be considered when 
properly analyzing the sphere decoder complexity, unless one 
resorts to a worst-case analysis. However, we argue that the 
worst-case analysis is unnecessarily pessimistic. 

In order to illustrate one of the key problems with focusing 
on the worst-case complexity consider the event that H = Q 
and 1 1 If IP < In this case it is easily seen that (fT4l i and 
(flSl l are always satisfied. As a consequence, the complexity 
of the sphere decoder would be equal to 

k=l k=l 

tT 

where we have used that rj = p^ to obtain the size of 
in (01. The worst-case complexity is therefore comparable to 
that of a full search over X as \X\ = p"^^ . However, there 
is also no point in decoding when Jf = as all codewords 
would yield the same ML metric which in turn implies a high 
probability of error Essentially the same argument, for opting 
out of decoding, can be made whenever the MIMO channel 
is in information theoretic outage [5|. In this case it follows 
by Fano's inequality that the probability of decoding error 
will be bounded away from zero. In fact, for a code with 
a diversity gain of d{r) any set of channel matrices Ti. for 
which P [H e "H) < p^'^^^\ may be neglected by the decoder 
with vanishing degradation of the overall probability of error 
However, rather than identifying and excluding a set of bad 
channel matrices directly, a more pragmatic approach is to 
impose a run-time constraint on the decoder and ensure that 
this constraint is such that the probability of it being violated 
is insignificant in relation to the probability of error This leads 
to the following measure of the decoding complexity, which 
we will use throughout. 

Definition 1: Let 

limi^^I^i^^^ (20) 

p^oo logp 
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where N is the number of nodes visited by the sphere decoder 
(cf. (fTTIl). The SD complexity exponent is then given by 

c(r) = inf{x I ^{x) > d{r)} (21) 

where d{r) (cf. (|9]l) is the diversity gain of the code at 
multiplexing gain r. 

D. A vanishing gap to the ML performance 

In order to illustrate the operational significance of c(r), we 
recall that in addition to the instances where the ML decoder 
makes an incorrect decision, a time-limited sphere decoder 
can additionally make decoding errors when the search radius 
is selected too small, i.e., when — (cf. JTSli), or when 
the run-time limit of becomes active, i.e., when N > p^. 
These extra errors cause a gap to ML performance which can 
be quantified as 

^ P({Xml ^X}U {Af, = 0} U {iV > p-}) 

9\^) — 

P(XMLy^X) 

describing the ratio between the probability of error of the 
time-limited sphere decoder and the ML decoder With respect 
to c(r) we then have the following. 

Theorem 1: A sphere decoder with a computational con- 
straint activated at p^ flops, allows for a vanishing gap to ML 
performance for all x > c(r), i.e., 

lim g{x) — 1, for any x > c(r). (22) 

p—¥oo 

The above simply states that for any x > c(r) it is possible to 
design a decoder based on the SD algorithm that achieves 
a vanishing SNR gap to the ML decoder, at a worst-case 
complexity of p^. To see this apply the union bound to get 

< P(i:ML ^ X) ^ Pj^ML > ^ P{N > P-) 

- v{Xmi. ^ X) v{Xmi. + X) p(i:ML + X) 
^ ^ ^ ^ ^ ^ ^ ^ ^ 

=1 ^0 ^0 

where the second and third term tend to zero with increasing 
p as the numerator tends to zero at a faster rate than the 
denominator cf. (fT9] l. (l2Tl l. This immediately translates to a 
vanishing SNR gap to the ML decoder at high SNR. In short, 
the probabilities of the events that the search space is empty 
or that the complexity of the run-time-unconstrained sphere 
decoder exceeds p^ are insignificant in comparison to the 
probability of ML decoding errors. 

Furthermore, one cannot in general time-Umit the sphere 
decoder to p^ for some x < c{r) and expect an arbitrary 
small gap to ML performance. Specifically, one can show (cf. 
(|68]l) that P(A^ > p^) > for any x < c(r), and as a 

result, it follow^for x < c{r) that 

, , P(N > p^) 

- P(Xml ^ X) 
^ V ' 

— ^oo 

'Note here that what we fonnally show is that under the basic technical 
conditions of Lemma |2] one cannot time-limit the decoder to for any 
X < c(r). The same statement naturally holds whenever 3" (a;) is strictly 
decreasing in x. 



implying that any attempt to significantly reduce the complex- 
ity below p^^'') will be at the expense of the vanishing SNR 
gap to ML decoding. 

IV. The sphere decoder complexity exponent 

We proceed to establish upper and lower bounds on the SD 
complexity exponent, in essence through the application of a 
principle (dating back to Gauss) which states that the number 
of integer lattice points within a (large) set is well approxi- 
mated by the volume of the set ll26l . Il27l . Thus, in order to 
approximate the number of nodes at layer k of the search tree, 
i.e., the size of A/fc defined in (fTSI l. we are primarily concerned 
with the volume of {[-~r],7]] + ^/^[—r|,r]])'' n £k where 
is the elliptical set given by (cf. (fTSI l) 

£k = {sk e C'= I \\rk - fifcSfcir < } • (23) 

The use of the volume principle for assessing the sphere 
decoder complexity was previously used in ifTTl . ||281 al- 
though its prior use in the communications literature is limited 
to the case of lattice decoding (i.e., where the constellation 
boundary constraint imposed by [—7], ry] is ignored by the 
decoder). Herein, we have to take the constellation boundary 
into account to obtain tight bounds on the SD complexity 
exponent. 

The upper and lower bounds presented in this section 
are essentially obtained in three main steps: 1) The volume 
principle is used to obtain an expression for the number Nk 
of visited nodes at layer k in terms of the singular values 
of Rk', 2) the singular values of Rk for k — 1, . . . , k are 
related to the singular values of the channel matrix H; and 
3) the theory of large deviations is used similarly to [5| to 
identify random events likely to cause an atypically large 
decoding complexity. Establishing the upper bound on c(r) 
turns out to be easier mathematically. The reason for this is 
primarily in the second step where the interlacing property 
of singular values of sub-matrices ||22| can be used to lower 
bound the singular values of Rk by the singular values of H, 
to yield results that are universally applicable for any full rank 
generator matrix G, cf. Theorem |2] Although the interlacing 
property gives both upper and lower bounds on the singular 
values of Rk, the upper bounds are unfortunately not sufficient 
for establishing tight lower bounds on c(r). We are therefore 
forced to develop tighter bounds that depend on some technical 
assumptions on G, cf. Lemma |2] While these conditions are, 
at least in principle, easily verified for any given code design, 
they are generally hard to verify for arbitrary classes of codes. 
Nevertheless, for some important classes discussed in Section 
IIV-DI we are able to conclude that the upper bound on c(r) is 
tight, thereby establishing c(r) exactly. 

The derivation of the upper bound on c(r) is given in 
the following sub-sections, while the derivation of the lower 
bound, which is similar in spirit to the upper bound but 
complicated by some technical details, is primarily given in 
Appendix iBl and discussed in Section HV-DI 

A. The volume principle 

As noted, we begin by establishing bounds on Nk = \Afk\ 
in terms of the singular values of the matrix Rk in ( fTSl ), 
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the sphere radius ^ and the constellation size 77. To this end, 
consider the following lemma, which corresponds to rigorous 
applications of the volume principle discussed above. The 
proof of the lemma is given in Appendix lAl 

Lemma 1: Let £ C M" be the ellipsoidal set given by 

where D E 
given by 



mxn ^jjj ^ ^ 



:-Dd\\ 
Let Be 



(24) 

be the hypercube 



B={deW' I \d,\ <T],i = l,...,n}. 



(25) 



Then, the number of integer points contained in the intersec- 
tion of £ and B is upper bounded as 



Ifnsnz"! < Yl 



2^ 



(26) 



and the number of integer points contained in £ is lower 
bounded by 

+ 



2^ 



(27) 



where ai{D), i = I, . . . ,n denote the singular values of D. 

Although Lemma [T] is phrased in terms of real valued quan- 
tities, it is easily applied to complex valued sets by considering 
each complex dimension as two real valued dimensions. In 
particular, the expression in (ITTt is equivalent to 

12 < ^2 



Ekh\ 



where 



and 



^{Rk) 



By noting that if Rk = UHV^ is the singular value 
decomposition (SVD) |22| of Rk, then 



Rk 



1R{U) -3?(C/)' 

















s 





is an SVD of Rj^, it follows that the singular values of Rf. 
are the same as those of Rk albeit with a multiplicity of 2. 
Thus, applying (l26T l in Lemma [T] to Afk (cf. (fTSI l) yields an 
upper bound on the number of nodes visited at layer k, which 
is given as 



2e 



MRk) 



(28) 



where ai{Rk), i = l,...,k denote the singular values of 
Rk- Here, in essence, the additive V2k term accounts for 
edge effects in the volume approximation, the first term in 
the minimum accounts for the size of the search sphere, and 
the second term in the minimum accounts for the constellation 
boundary. 

The lower bound in dZTl l will be used later in order to assess 
the tightness of the upper bound on c(r) developed next. The 



reason for providing a lower bound on \£ D Z"| and not \£ n 
B n Z"\ is that we cannot a-priori rule out that c in (l24l l is 
such that £ n B ~ 9, a case which if not ruled out would lead 
to the trivial lower bound |f n S n Z"| > 0. 

B. Singular value bounds 

The interlacing theorem of singular values of sub-matrices 
(cf. L22„ Th. 7.3.9] and |25, Corollary 3.1.3]) states that the 
singular values of Rk are bounded by the singular values of 
R according 



1, 



,k, (29) 



where ai{Rk) and <Ti{R) denotes the ith singular values of 
Rk and R respectively. As i? — Q^M where Q has a set of 
orthogonal columns that span the range of M it follows that 
(Ti{R) = (Ji{M). Further, by the definition of M in (fTTT) we 
have that cri{M) > e-faiilr'^H) where 7 = cti (G) > due 
to the assumption that G is full rank. The singular values of 
It ^ H are the same as those of the channel matrix H in 
([T]l, albeit with a multiplicity of T, i.e.. 



where 



it(«) = 



(30) 



This can be seen by noting that if H ^ IfEV^ is the SVD 
of H, then 

{It<S)U){It<S)'S){It<S)V^) 



is an SVD of It <E) H (albeit with a non-standard ordering of 
the singular values). Alternatively, on can apply ||25l Theorem 
4.2.12] to the eigenvalues of It ® H^H. 

Combining the above yields a lower bound on the singular 
values of Rk in terms of the singular values of the channel 
matrix H according to 



a,{Rk)>9-fa,^t^,){H) 



1. 



and an upper bound on the number of nodes visited by the 
sphere decoder at layer k according to 



Nk<l[ 



2k 



2e 



2V2fc7; 



(31) 



In order to bound the probability that the right hand side of 
( I3T] ) is atypically large in the high SNR regime, it is useful to 
consider the SNR dependent parameterization of the singular 
values (or eigenvalues) of H^H introduced in ISj, i.e., SNR 
dependent random variables a;, for i ~ 1, . . . , ut, defined by 



log/? 



(32) 



Note that by this definition <Ti{H) = p^a"'. The variables, 
ai for i = 1, . . . , riT are referred to as the singularity levels 
of H as they give an indication of how close to singular the 
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channel H is in relation to the inverse SNR p ^. As ^ = 



p2 2k and 1] = 
\/fc + mill 



it holds that 

2e 



where 



Vi = mm 



rT 



1 + 



rT 



By (EB it follows that 



(33) 



(34) 



However, as the SNR exponent on the right hand side of ( 
is non decreasing in k it must for the total number of visited 
nodes N hold that N = J2k=i ^ P^^=^ or for any given 
5 > Q hold that 

N < pT.-=i''^+s (35) 

provided p is sufficiently large. 

Consider now the set (cf. (l33T l and (l35T l) 



^mi 



rT 

mm I 1 + Q;, ,j,(i 



(36) 

where a. — (ai, . . . , a„^). As (|35] l holds (asymptotically) for 
any (5 > 0, and since a. ^ T(y) implies that N < p^ for any 
y < X by ( |35] | and (|33] |. it follows that 

^.^ logP(jV>p-) ^ ^.^ logP (a e r(2/)) 

p->oo log p 

Equivalently (cf. ^) 



lim 

p— 7-00 



log p 

\ogP{cxGT{y)) 

log/5 



(37) 



for y < X. The value of the bound in dJTl l is that the right hand 
side is readily computed using large deviation theory 1,29] . 

C. Large deviations 

A sequence of random vectors (3p E M" parameterized by 
p is said to satisfy the large deviation principle [291 with rate 
function I, 

/ : R" {R+ , oo} , 
if for every open set Q C M" it holds that 

liminfi^i^i^^>-inf/(/3) (38) 
p->-oo log/9 peG 

and if for every closed set T C M" it holds that 

logP(/3„eJ") , , 

liminf — — ^ ^ < - inf /(/3) . (39) 

p-i-oo log/9 f3£T 

Although not stated formally, one of the central results of (|5] 
is that the sequence of random variables given by ap = a = 
{ai, . . . , Onrj.) (cf. (|32] l) satisfies the large deviation principle 
with rate function (see the proof of Theorem 4 in |5 1) 



nT 



I{ct) = 2^(n-R -nT + 2i- l)ai 



(40) 



if > ... > q:„^ > and /(a) = oo otherwise. This 
observation was key in establishing the DMT in |5|. 

By combining ( [37| i with ( [39] l, and noting that T(?/) is a 
closed set, it follows that 



*(a;)>/(2/)= inf /(a) 



(41) 



for any y < x. As T{x) C T{y) for all y < x it follows that 
f{y) is non-decreasing and it can additionally be verified that 
f{y) is left-continuous, i.e.. 



sup/(2/) = f{x) , 

y<x 



which implies that 



> f{x) = inf /(a) . 
oLeT(x) 



(42) 



From (l42l l it follows that the complexity exponent c(r) is upper 
bounded by c(r) where 

c(r) = inf{x I /(x) > d(r)} = sup{x | /(x) < dir)) (43) 

and where the last equality follows as /(x) is non-decreasing. 
Further, by the left-continuity of /(x) it follows that the 
supremum on the right is attained, i.e., the supremum can be 
replaced by a maximum. 

Note however that the condition that /(x) < d{r) is satisfied 
if and only if there exist an a G T(x) such that I{a.) < 
d{r). Thus, c(r) in ( l43l l could alternatively be obtained as the 
solution to a constrained maximization problem according to 



max X 



(44a) 



s.t. ^ min ( ^ - 1 + a^^(j) , — ) > x (44b) 

i=l 



K 



K 



^^("-R — riT + 2i — l)ai < d 
«!>...> a„^ > , 



(44c) 
(44d) 



where (I44bb follows from the constraint a. e T(x), and 
where (|44ct and ( |44dt follows from /(a) < d(r). It is 
straightforward to show that the optimal x in (|44] | must be 
such that ( |44bt is satisfied with equality. By further noting 
that the sum in ( I44bl i contains only nx distinct terms, each 
with multiplicity T, it can be seen that 

^ . (rT ^ rTY 



:^Tmi 



r r 
mm I 1 + ai , — 



where we have also used the full rate assumption that k = 
htT. We summarize the above in the following theorem. 

Theorem 2: The SD complexity exponent c(r) of decoding 
any full rate linear dispersive code with multiplexing gain r 
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and diversity d{r) is upper bounded as c(r) < c(r) where 



c(r) = max T min ( 1 



(45a) 



s.t. ^(jiR - ?iT + 2i - \)ai < d{r) (45b) 

(45c) 



> 0. 



The upper bound given by Theorem |2] can naturally be 
computed given explicit values for the multiplexing gain r 
and diversity gairQ d{r). However, it is also possible in some 
cases to give general solutions as a function of r when the 
DMT curve d{r) of the code is known explicitly. In particular, 
DMT optimal codes such as those presented in 1 12|-|,17l have 
a diversity gain of d{k) — (nx ~ fc)('^R ^ k) at any integer 
multiplexing gain r = fc |51. In this case it is straightforward 
to verify that ar0 optimal a in ( |45] l is given by 



a* = 1, for i = 1, . . . , nx — k 



and 



0, for i — riT 



1, 



,"-T ■ 



To see this, note that the objective function in ( 145 ab is 
symmetric with respect to permutations of the set of ai for 
i = 1, . . . , riT- As the sum in the diversity constraint ( |45b) 
places more weigh on aj than on a^, for j > i, it is 
optimal to increase ai until the term in ( 145 at containing ai 
saturates (i.e., when ai = 1), then to increase a2 etcetera, 
until the constraint is satisfied with equality. This yields the 
aforementioned solution. Note also that a*, . . . are the 
same singularity levels that give the typical outages in |5|, (cf. 
Section IV-Ab . Inserting the optimal solution into ( 145 ab yields 



Tk{nT^ — k) 



which is a remarkably simple upper bound on the SD com- 
plexity exponent of decoding any DMT optimal code at an 
integer multiplexing gain r = k. 

For a DMT optimal code at a possibly non-integer value of 
r, let k be the integer such that re [k,k + 1), i.e., fc = [rj. 
The optimal solution is in this case given by 



a,- 



1, for i — I, 



a* = 0, for i = nx — fc + 1, 



1 , 



,nx : 



and 



fc + 1 



Substituting the above solution back into ( 145 ab yields 

T 
nx 

We summarize the above in the following theorem. 



z(r) — — I r(nx — fc — 1) + (rtxfc — rinj- — 1)) 
nx V 



^Note that Theorem |2] does not assume a diversity optimal code. 
*In general, i45\ does not have a unique optimal point as min(a, is 
constant in a for a < and a> b. 



Theorem 3: The SD complexity exponent c{r) of decoding 
any DMT optimal full rate linear dispersive code with integer 
multiplexing gain r = fc is upper bounded as 

Tfc(nx — fc) 



c(fc) < c(fc) = 



nx 



(46) 



For general r where < r < nx the SD complexity exponent 
c(r) is upper bounded as c(r) < c(r) where 

T 
nx 



:(r) -^(r-(nx-L7-J-l) + (^TW-^("T-l))^) • (47) 



The function c(r) in ( |47] i is a piecewise linear function in 
r, although slightly more involved than the set of straight 
lines describing the optimal DMT d{r). For = T = n 
the function in Wt\ coincides with the curve for c(r) shown 
in Fig. [T] 

D. Establishing the exact SD complexity exponent 

We now turn to specific cases where we can exactly es- 
tablish the SD complexity exponent c{r) by establishing that 
the upper bound c(r) < c(r) is in fact tight. To this end, we 
begin with the following lemma, which provides a sufficient 
condition for c(r) = c(r), i.e., for the tightness of the upper 
bound. 

Lemma 2: Let G|p G C'^^'^p be the matrix consisting of 
the first Tp columns of the generator matrix G G C^". If 
there exists, for p — 1, . . . , nx, unitary matrices Up E C"'^^^ 
such that 

rank((/T ® UfjG^p) = pT , (48) 

then c(r) = c(r) for all r £ [0, nx], where c(r) is given by 
dm in Theorem m 

The proof of Lemma [2] is similar in spirit to the proof 
of Theorem |2] although riddled with technical details, and 
therefore relegated to Appendix |B] In essence, the condition 
posed in ( |48] | implies that there are certain orientations of the 
right singular vectors of the channel H (in relation to the code 
generated by G) for which the lower bound in (|29) is suffi- 
ciently tight. Details are provided in Appendix |B] and some 
additional discussions of ( 1481 ) and the general applicability of 
the lemma can be found in Section [V] However, we first apply 
Lemma |2] to find c(r) in some very important special cases. 

To this end, it is useful to first note that permuting the 
columns of G, i.e., replacing G with Gil where 11 e M"^'^ 
is a permutation matrix, does not change the code X. Instead, 
the effect such a permutation would have is that it would 
change the order in which the symbols in s are enumerated by 
the sphere decoder described in Section IIII-AI (cf. [|3i Section 
IV]). In the present context, the first pT columns of Gil, i.e., 
[Gn]|p, may differ from those of G. Thus, we see that (l48b 
depends not only on the code itself, but also on the order in 
which the constituent symbols Sj are enumerated by the sphere 
decoder (cf. lO, (lU where the topic of column ordering is 
discussed in detail). 

In the context of Lemma |2] it can be seen that as 
{It (E)Up)G e C"^'''' has rank pT for any unitary Up 
due to the full rank assumption on G. One can therefore 
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select pT linearly independent columns, or equivalently, find 
a permutation matrix 11 such that {It ® t/^ )[Gn]|p has full 
rank. Using a similar argument, we can recursively construct a 
(single) n for which there are Up for p — 1, . . . , satisfying 

rank ((It ® Uf)[GU]^p) = pT 

by constructing Up-i from Up by removing a column, 
selecting the appropriate columns from G, and starting the 
recursion with an arbitrary [/„^. Interpreting the above in 
light of Theorem |2] and Lemma |2] we can thus establish that 
for any full rate linear dispersive code design, c(r) as defined 
in Theorem|2]and given in Theorem|3]for DMT optimal codes, 
is the tightest upper bound on the SD complexity exponent 
that can possibly hold under arbitrary column orderings. This 
is formalized in the following. 

Theorem 4: Given any full rate linear dispersive code 
achieving diversity d{r), there is always at least one column 
ordering for which c(r) = c(r). 

However, while Theorem |4] is useful in the sense that 
it tells us that one could not improve upon the tightness 
of c(r) without introducing further assumptions regarding 
the particular code design considered, it is obviously not of 
practical interest to use the worst possible column ordering. 
Therefore, we turn our attention to the important class of 
threaded codes |30| for which we will show that the natural 
column ordering 11 = implies c(r) ~ c{r). 

1 ) Threaded codes: The class of threaded code designs is 
of particular interest, as it includes full rate codes that perform 
very well in a variety of settings. The threaded algebraic 
space-time (TAST) codes 1301 . codes constructed from cycUc 
division algebras (CDAs) (3T\, f32], and specifically modified 
CDA codes |[T4l-|[l7] that were shown (cf. [15J) to achieve 
the entire DMT, are prime examples. The CDA based threaded 
designs are also the only currently known explicit construc- 
tions capable of achieving the DMT for all values of rix and 
simultaneously over all r g [0,riT]. All these codes have a 
common threaded structure. Specifically an n x n threaded 
code is built from n component codes mapped cyclically in 
threads (or layers) to the codewords X. For example, in the 
special case of n = rix = T = 4, the thread structure is given 
by 



where the numbers 1,2,3,4 indicate the thread to which a 
particular entry of X belongs. In general, symbol j in thread I 
is mapped to ^ where k = mod {j — l, n) + l and where 
mod {■ , n) denotes the modulo n operation. For example, in 
the case of perfect codes |16l, fTl\ which also employ a 
threaded structure, the code follows from 



lay(X) = e 



BoC 










Bn-lC 







(49) 



where 

Bj = Diag( l, 1, ^. . , 1 , 7, 7, . . . , 7 ), i = 0,--- ,n-l 

n ~ i entries i entries 

are full rank diagonal matrices incorporating a properly chosen 
thread-separating scalar 7 G C, where C G C"^" is a (unitary) 
full rank generator matrix for the component code of each 
thread, s*^') e S'^' are the constellation symbols of thread I, and 
where lay(X) denotes the matrix to vector operation obtained 
by stacking the elements of X according to their thread (cf. 
the column based stacking of the vec(-) operation). 

Regarding the cost of decoding by such codes, we note that 
the corresponding generator matrix G £ C" ^" is related 
to T through a permutation of the rows (cf. (|49)) in such a 



way that the (i, j)th block Gij G 



of G contains exactly 



one non zero row which it self is one of the rows of BjiC. 
Consequendy, G\,p = [G^ • • • G,p] e C"^"p has rank p 
and contains exactly p non-zero rows. This holds for any n 
and p < n. Now, let Up E C"xp be a unitary matrix with 
the property that any p rows of Up are linearly independent. 
Such matrices can clearly be constructed, and an example is 
the matrix that contains the first p discrete Fourier transform 
(DFT) vectors of length n. Let € Cp^"p be the matrix 
containing only the non-zero rows of G|,;p £ (^py-np^ jgj 
Uip e C^^'' be the full rank matrix consisting of the rows of 
Up matching the non-zero rows of G\ip. It follows that 

■ ~ H 

U^p 



i/1 





G|ip 




G\np_ 





^ (J^^p X ^p 



Up 

is full rank as both tip G C"p^"p and G\p G C"p^"p are full 
rank and square matrices. Note also that the same argument 
can be made regardless of the ordering of the threads, and for 
any other code with a threaded structure, provided the symbols 
in s are grouped into layers as in (@9). This is stated in the 
following. 

Theorem 5: The SD complexity exponent, given any 
threaded code with n ~ = T that is decoded with 
the natural column ordering or under any other threat-wise 
grouping, is c(r) — c{r) where c(r) is given in Theorem |2l 

Consequently directly from Theorems |3] and |5] we have the 
following result for DMT optimal threaded codes. 

Theorem 6: Sphere decoding with thread-wise grouping of 
any DMT optimal threaded code with n ^ tit = T, achieves 
DMT optimality with a SD complexity exponent of 

c(r) = r(n - [rj - 1) + {n[r\ - r{n - 1))+ (50) 

which, for integer values of r = fc, simplifies to 

c{k)^k{n~k). (51) 

We briefly note that as expected the complexity increases 
with increasing n = jit ^ T for any fixed r which is quite 
natural as the size of the codebook X and the signal space 
dimension increase. One can however also note that c(r) is 
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independent of the number of receive antennas riR (provided 
"^R > ^t)- This is specific to the DMT optimal behavior 
and threaded structure of the codes, and may be explained 
by the fact that even though the channel quality generally 
improves by adding receive antennas - thus generally reducing 
complexity - the same improvement also occurs in the error 
probability performance of the code, and these two effects 
cancel each other in the SD complexity exponent. 

2) 2 X 2 approximately universal codes: We here go one 
step further and identify a class of codes for which we 
can state, without limitations on the actual code structure, 
that c(r) — c{r) for any column ordering. In particular, we 
establish this for the class of all 2 x 2 approximately universal 
codes, i.e., all minimum delay codes that can achieve DMT 
optimality over the 2 x channel irrespective of fading 
statistics ||2T1 . This is accomplished, albeit only for the specific 
case of nT X T — 2 x 2, by proving that (l48T l follows 
from the so called non-vanishing determinant (NVD) condition 
ll32l which is well known to be a necessary and sufficient 
condition for approximate universality. We consequently have 
the following. 

Theorem 7: Any 2x2 full rate approximately universal 
linear dispersive code, irrespective of its structure, introduces 
a SD complexity exponent of 

c(r) = miii(r, 2 — r). 

As the NVD property does not depend on the ordering of the 
columns of G, it also follows that the conclusion of Theorem]?] 
holds irrespective of the column ordering. 

V. Implications and Discussions 

A. Decoding complexity and information theoretic outages 

We recall that the claim of Theorem |2] may be expressed in 
terms of the function (cf. (145 ab ) 

c(r : a) = Tmin ( 1 + a-i , — ) , (52) 

1=1 ^ ' 

which provides a conditional, asymptotic, upper bound on the 
sphere decoding complexity according to < °) in terms 
of the multiplexing gain r and the singularity level a. The 
final upper bound c(r) in (l45T l is then given as the worst-case 
c(r : <y) over all singularity levels that occur with a probability 
that is larger than or equal to the probability of error, given 
asymptotically by the diversity gain d{r) of the code. 

The characterization of the DMT in |5| relies on the asymp- 
totic probability of outages at high SNR, i.e., the probability 
that the i.i.d. Rayleigh fading AWGN channel given by 

y^ = Hxt + wt 

with a power constraint E {||a;t|p} < p cannot support an 
asymptotic data-rate of i? = rlogp + o(logp). As was shown 
in 0, this occurs when the singularity levels belong to the 
outage set A{r) = {a. \ — a^)^ < r}, and the diversity 

of the outage event is given by the most likely set of singularity 
levels that satisfy this condition, i.e., d{r) = inff;^g_4(r) I{ol) 
where /(a) is given in (|40] |. 



If we restrict attention to the set of singularity levels whose 
probability of occurring does not vanish exponentially fast, i.e., 
for which I{a.) < oo or equivalently > • • • > cki > 0, we 
can for DMT optimal code^ make an interesting connection 
between the decoding complexity and information theoretic 
outages. In particular, as d{r) = 'vaiaeA(r) -^(ck) it follows that 
X]i(^*R~"'T+2j— < d{r) if and only if ^,-(1—0!^)+ > r. 
We can thus, for DMT optimal codes, equivalently express (l45l l 
according to 

c(r) max c(r : a) (53a) 

a 

s.t. ^(1 - a,)+ > r (53b) 

i=l 

«!>...> > , (53c) 

which may be interpreted as the worst-case complexity 
(bound) over all channels that are not in outage. This sig- 
nificantly strengthens the connection between channel and 
decoding outages touched upon in Section IIII-CI 

The concept is illustrated for nx = T = 2 in Fig. |2] where 
c(r : a) is plotted as a function of a = (0:1,0:2) over oi > 
02 > 0. In this case, c(r) ~ inin(r, 2 — r). Note also here that 
we know by Theorem [T] that c(r) ~ c{r). Singularity levels 
that are in the outage region — oti)^ < r are shown in a 
darker shade. It can be seen that increasing the multiplexing 
gain r increases the codeword density and codebook size and 
consequently broadens the set of singularity levels that can 
potentially lead to higher complexity. However, increasing the 
multiplexing gain also reduces the set of channels that support 
the data-rate, thus limiting the set of singularity levels for 
which the decoder needs to be applied, leading to an overall 
reduction in the SD complexity exponent as r approaches its 
maximum value. 

Further, the connection to information theoretic outages 
allows for an intuitive explanation of the result of Theorem |6] 
and in particular (l5ll . To this end, it is illustrative to consider 
a heuristic argument involving low rank channel matrices H. 
As noted in (]5|, the typical outages at integer multiplexing 
gains r = fc are caused by channels that are close to the set 
of rank k matrices, i.e., that have n — k small singular values. 
If we for the purpose of illustration assume that H has rank 
k, it follows that It ® H for T — n, and M, has rank nk, 
and consequently a null-space of dimension n{n — k). This 
implies that the n{n — k) x n{n — k) lower right block of R 
io identically equal to zero, and the sphere decoder pruning 
criteria become totally ineffective up to and including layer 
n{n — k). As the size of §^ is |§^| = p~ for r = k, we see 
that the number of nodes searched at layer n{n — k) of the 

'To be precise, we are assuming here that we are working with approxi- 
mately universal codes for which it is known that eiTors are only likely when 
the channel is in outage [21 1. All explicitly constructed full rate DMT-optimal 
codes known to date, are also approximately universal. 

'"This also requires that the first nk columns of R are linearly independent. 
In fact, the rigorous treatment of this technical detail is largely responsible for 
much of the difficulty in establishing the lower bounds on c(r). In particular, 
condition 148) in Lemma[2] guarantees that this happens with sufficiently high 
probability, while Lemma |3] provides a perturbation analysis that allows us 
to extent this intuitive reasoning to channels that are close to the set of rank 
deficient matiices. 
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Fig. 2. Conditional SD complexity exponent bound c{a) as a function of singularity levels a = (ai,a2). Singularity levels that corresponds to outage 
events at the target multiplexing gain r are shown in dai'k grey, while singularity levels capable of suppoiling the target multiplexing gain are shown in light 
grey 



SD search tree is IS^j'" '''l ^ (cf. (EJ)- In order to 

ensure close to optimal performance, the sphere decoder must 
be able to decode for channels where n — k singular values 
are close to zero. However, channels with even more singular 
values close to zero occur with a probability that is small in 
relation to the outage probability or the probability of ML 
decoder error, and can thus be safely ignored by the decoder 



B. A complexity bound that holds for all fading statistics 

It is perfectly conceivable that the sphere decoding com- 
plexity may rise under specific codes and under specific fading 
statistics that tend to regularly introduce channel instances that 
are difficult to decode for A natural question is then whether 
one can bound the complexity, irrespective of the code and 
of the statistical characterization of the channel. Considering 
Theorem 12] and the proof of this theorem, we can see that the 
i.i.d. Rayleigh fading assumption only enters through the rate 
function /(a). Consequently, we may directly restate Theo- 
rem |2] for other fading distributions after updating (145b) and 
(145 cl i with the appropriate rate-function I{a). Some relevant 
examples of rate-functions for other fading distributions are 
given in [33]. 

Further, regardless of which /(a) applies, the upper bound 
c(r) in Theorem |2] is non-decreasing in d{r) and maximized 
when d{r) corresponds to the outage exponent. In this case 
c(r) is, again, given by (|53] l, which does not explicitly depend 
the fading distribution other than through the assumption that 
P(a„T, < 0) vanishes exponentially fast; an assumption that 
holds for all reasonable distributions. This impUes that SD 
complexity exponent is universally upper bounded as (cf. 
Theorem O 

c{r) < -^{r{nT - \r\ - 1) + (rixH - fin-r - l))^^ 

for any full-rate code and statistical characterization of the 
channel. This is also clearly the tightest upper bound that can 
hold for all (full-rate) codes and fading statistics. 



C. Fast decodable codes 

In IMl-El a family of DMT optimal rix x T = 2 x 2 space- 
time codes called fast decodable codes |37| were constructed. 
The SD complexity exponent (and also its upper bound) 
provides an interesting approach for comparing the complexity 
of decoding regular codes and fast decodable codes. Before 
doing so it should be noted that these fast decodable codes 
are not, strictly speaking, of the form in (|3} as the real 
and imaginary part of each constituent symbol is dispersed 
separately. Nevertheless, the fast decodable codes may be 
decoded by an equivalent real valued sphere decoder that 
performs a search over a tree with 2k layers and 
branches per node, and we can compare the reported worst- 
case complexity of this real valued sphere decoder to the 
complexity of the complex valued sphere decoder considered 
herein. 

The fast decodable codes have the appealing property that 
the upper right 4x4 block of the real valued ReM?^^ (cf. 
(fT3] l) is always a diagonal matrix, regardless of the particular 
realization of H. While the regular real valued sphere decoder 
for a 2 X 2 full rate code would perform a (bounded) search 
over the entire tree, it is sufficient for the fast decodable codes 
to (without loss of optimality) perform a search over only the 4 
first layers, and extend each node at layer 4 to a valid codeword 
through a faster, linear, ML decoding. This simplified version 
of the real valued sphere decoder can be viewed as a search 
over a regular tree where each node has |§^|2 children up 
to layer 4, but only one child per node for the 4 remaining 
layers. Consequently, the worst-case number of nodes visited 
by the simplified sphere decoder is 5|S,,|2 + l^*?!^ = 

= p'' as opposed to Yl\=i \^ri\^ ^ P^^ for regular 
real valued sphere decoder, cf. ||37| . Thus, fast decodability 
implies a reduction by a factor of 2 in the worst-case SNR 
exponent, which is significant at high SNR. 

However, this worst-case SNR exponent of the simplified 
SD algorithm should be viewed in light of the SD complexity 
exponent induced by any 2x2 approximately universal code 
as given by Theorem |7] i.e., c(r) — min(r, 2 — r). The worst- 
case SNR exponent of the regular sphere decoder and of the 
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simplified sphere decoder, and the SD complexity exponent 
c(r) are shown in Fig. |3] For multiplexing gains lower than 
or equal to 1, the SD complexity exponent and the worst-case 
SNR exponent of the simplified sphere decoder actually coin- 
cide, while the SD complexity exponent is strictly lower for 
multiplexing gains higher than one. The interpretation is that 
a run-time constrained sphere decoder will yield asymptotic 
ML performance for any 2x2 approximately universal code 
at a complexity that is comparable to the reported worst-case 
complexity of the fast decodable codes at low multiplexing 
gains, and significantly better at high multiplexing gains. Thus, 
in a sense, all approximately universal codes are fast decodable 
at high SNR. However, we hasten to add that the fast decodable 
structure can naturally still be desirable in many cases of 
practical interest. 

D. The applicability of Letnma^ 

Finally, we discuss the appUcation of Lemma |2] to codes 
not considered herein. To this end, note that for any given 
generator matrix G of some code not covered by Section lTV-DI 
it should be clear that if (l48T l holds then Lemma |2] could be 
used to establish a tight lower bound on c{r). This said, we 
also wish to caution the reader that (l48T l only represents a 
sufficient condition for c(r) — c(r). It does not necessarily 
follow that c(r) < c(r) if ( |48] | is not true. In other words, the 
question of if there are code designs that improve upon the 
bound c(r) is not answered in the positive by finding code 
designs for which (l48T l does not hold. 

As for testing ( |48] | it should also be noted that one does not 
have to restrict the search for Up to the set of unitary matrices. 
Any full rank matrix A £ C"'^^^ can be factored, e.g., by the 
QR decomposition, as UpT = A where T e C^^p has rank 

TT , 

p and where Up is unitary. Hence, as [It ) is full rank, 
it follows that 

{It <E> A")G|p = {It ® tH)(/t ^ C/^)G|p 

is rank deficient if and only if (|48] | fails to hold. Hence, 
the statement of Lemma |2] could be phrased in terms of the 
existence of any Up, not necessarily unitary. 



where | • | denotes the determinant, and note that p{A) is 
a polynomial in the elements of A. It can thus be seen 
that if p{A) ^ for some A e C"'^^?', i.e., p{A) is 
not the zero polynomial, it follows that the set of A for 
which p{A) = has zero Lebesgue measure. It is then a 
straightforward extension to show that (@8} holds either never 
or almost always with respect to the set of unitary matrices 
Up over the Stiefel manifold (i.e., the set of all unitary n x p 
matrices) endowed with the Haar (uniform) measure. This 
suggests a rather interesting conceptual method for verifying 
(|48] |. Given a specific generator matrix one could at least in 
theory test the condition of Lemma |2] by selecting Up (or A) 
uniformly at random, and the condition of Lemma |2] would be 
proven with probability one if true. However, finite precision 
computations will limit the practical applicability of such an 
approach, although symbolic computations could potentially 
be a way to test a specific code design. 

VI. Conclusion 

The work addressed the open question of identifying the 
computational cost of near-ML sphere decoding. In the high- 
SNR high-rate regime, the introduced SD complexity expo- 
nent asymptotically described this cost, concisely revealing 
the cost's natural dependencies to the codeword density, the 
codebook size, as well as to the SNR, dimensions and fading 
characteristics of the wireless channel. This exponent currently 
sets the bar with respect to the computational reserves required 
for decoding with arbitrarily close to ML performance, and the 
clear challenge is now to identify transceivers with a lesser 
complexity exponent that can still guarantee a vanishing ML 
gap- 

The simplicity of the provided guarantees can offer in- 
sight into designing robust encoders, decoders, and time- 
out policies, as well as guidelines for network planning in 
settings where rate, reliability, and computational complexity 
are principal intertwined concerns. Such guarantees can apply 
towards substantial savings in energy, processing power, and 
hardware. 

Appendix A 
Proof of Lemma 1 

In the following, we provide a proof of Lemma [T] starting 
with the upper bound in (|26] | and then establishing the lower 
bound in ( |27] |. To this end, note that the length of the ith 
semi-axis of £, denoted e^, is given by 

''~a.{D)- 

Let Ci be the smallest orthotope (box), aligned with and 
containing £, i.e., Ci is an orthotope with side lengths 
(see Fig. |4]i. Let C2 be a hypercube with side-length 2y/nr], 
centered at the origin and aligned with Ci (see Fig. |4|. As the 
diagonal of B is 2y/nri it follows that B C C2, regardless of the 
orientation of C2. Let C3 be given by C3 = €1062 and note that 
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Fig. 4. Illustration of the proof of i26i in Lemma [T] in the case of n = 2. 
The lemma provides an upper bound on the number of integer points within 
the shaded area, corresponding to the intersection of the ellipsoid and the 
constellation boundary. 



£ n B C C3 as £ C Ci and S C C2. As Ci and C2 are aligned, 
it follows that C3 is also an orthotope. Let Zi, . . . , /„ denote 
the side-lengths of C3 and note that li < min(ei , 2y/nri). 

The mean value theorem ll27l states that for any convex 
body (or set) C C M" it holds that 



Vol(C) = / 
Ju 



'L''r\C + u\du 



(54) 



where 



2 ' 2 J 



denotes the unit cube in M", i.e., the volume of the set equals 
the average number of integer points in the set when perturbed 
by a uniform random perturbation over the unit cube [27 1. This 
statement is a rigorous version of the intuitive notion that the 
number of integer points in C may be approximated by its 
volume, and it can be used to obtain a non-random upper 
bound on the number of integer points in C3. To this end, let 
C4 be the orthotope, aligned with and centered around C3, with 
side lengths U + ^/n (see Fig. |4]l. By construction, it follows 
that 

C3 C C4 + tt 



for any u ^U.li therefore follows by (|54] | that 

n 

|C3nz"| <Voi(C4) = n[\^ + ^^] ' 

where 



i=l 



h < min 



2^ 



2y/nri ) . 



As £ n S C C3 the upper bound in ( |26] | follows. 



In order to establish the lower bound in (|27t we may 
redefine Ci to be the orthotope, aligned with the semiaxes 
of £ and with the same center, having side-lengths 

y/nai{D) ' 

for z = 1, . . . , n. Now, by construction Ci G £ which implies 
that |Ci nZ"| < |£nZ"|. Let C2 be another orthotope, aligned 
with Ci and with side-lengths max(6i — -^71 , 0). It follows that 
C2 + u C Ci for any u G Z^. By reasoning similar to what is 
used in the proof of the upper bound (cf. ( |54l l) it follows that 

n 

\£ r\I2'\> vol(C2) = n max(6, - , 0) , 
i=i 

which establish the lower bound in (IZTT i. 

Appendix B 
Proof of Lemma 2 

Let cy* = {a\ , . . . , a*^ ) be an optimal point of ( |45] ) and 
let q be the largest integer for which (cf. (145a) ) 



1 



a* > . 



(55) 



Note that we can without loss of generality assume that q>l 
as otherwise c(r) = (cf. (145 al l) and c(r) = c{r) would be a 
trivial statement. It follows that a* > for i = 1, . . . ,q and 
we may also without loss of generality assume that a* < 1 
for i = 1, . . . ,7iT as the objective in ( |45] ) does not increase 
in ai beyond ai — 1. The goal will be to show that layer 
k = qT of the sphere decoder contains close to p'^^'"-' nodes 
with a probability that is large with respect to the probability 
of decoding error P (Xml ^ X) = p-'^^''\ 

To this end, let H = C/SF" be the singular value 
decomposition of H, where 

S4Diag(ai(ff),...,a„,(ff)) 

and where U^U = I. Let Up denote the last p = n'Y — q 
columns of U (corresponding to the p largest singular values) 
and let a — (ai, . . . , Q!„^) be the random vector of singularity 
levels given by ( |32] |. Now, consider the set of conditions (or 
events) given by 

^1= {a* - 25 < ai < a* - 5 , i = 1, . . . , q , 

Q < ai < 6 , i = (7 + l,..., rix} , (56a) 

for some given (small) 5 > 0, 

ri2={ai((JT®C/«)G|p)>u}, (56b) 

for some given w > 0, 

n^HWQ^wWKl}, (56c) 

and 

^iH\\s\\<\r^}. (56d) 

Note also that by choosing 5 sufficiently small, we may 
without loss of generality assume that Vli implies that ai > 
for alH = 1, . . . , riT- 

The following proof is structured as follows: First, in 
Sections IB-AI and IB-BI it is established that ( 1561 ) represent 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 2011 



15 



sufficient conditions for the number of nodes Nk visited in 
layer k — qT to be close to p^'''^ Then, in Section |B-CI it is 
established that the set of conditions in ( |56] l are simultaneously 
satisfied with a probability that is large with respect to the 
probability of error. 

A. The Constellation boundary 

We begin by proving that, given (|56] l. the constellation 
boundary may be ignored, i.e., that §^ may be replaced by 
§00 in ( fTSI l without changing the result, thus making the lower 
bound ( |27] l in Lemma [1] applicable. To this end, let Sk E §^ 
be an arbitrary point in the fc-dimensional infinite constellation 
(i.e., the Gaussian integer lattice) and assume that Sk satisfies 
the sphere constraint at layer k, i.e. 

Wrk-RkSkW < 

Note that = RkSk + Vk, where Sk denotes the last k 
components of the transmitted symbol vector s e and 
where Vk denotes the last k components of v ^ Q^w. It 
follows that 

\\rk - RkSkW = \\Rk{sk - Sk) + Vk\\ 

>ai{Rk)\\sk - Sk\\ - \\vk\\ 

which implies that 

1 



\Sk - Sk\\ < 



{i+\\Vk\ 



and 



\Sk\\ < 



1 



0-1 [Rk ) 

By the interlacing property of singular values (cf. (|29^ ) it 
further follows that 



(Ti(-Rfc) > 0jai{H)=p 



1 rT 1 ^ 



2^1 > 



1 rT 

where we recall that 9 = p2^— is the power scaling and 
7 = cri(G) > 0, and where the last inequality is implied by 
(l56a] i and < 1. As ^ = p" and < \\Q^w\\ < 1 by 

(I56cll it follows that 



1 



-{^+\\vk\\)<p- 



0'l(i?fe) 

By ( l57b . ( I56dl ). ||sfc|| < ||s|| < r/ and since pii^i^ < ^-q, it 
follows that 

given that p is sufficiently large. This implies that Sk E 
Thus, any integer point that satisfies the sphere constraint must 
also belong to the constellation, and we can proceed using (|27] | 
to lower bound the complexity. 

B. Singular value bounds 

We proceed to provide bounds on the singular values of Rk 
in order to lower bound the number of nodes visited in layer 
k = qT. However, as stated previously the interlacing theorem 
is, unlike in the derivation of the upper bound, not sufficient 
for our purposes. Instead, we consider the following lemma, 
proven in Appendix ICl 



Lemma 3: Let A G £.rnyn^ m > n he an arbitrary matrix 
and QR = A he the QR decomposition of A. Partition A, 
Q and R according to 



[A^ A2] = [Qi Q2] 



Rii R12 
R22 



where Ai e C™^" and R22 eC'^^'". Then, assuming that 
<Ti{A) < (Ti(Ai) for i = 1, . . . , fc, it holds that 

cr„(A) 



a^iR22) < 



(Ji{Ai 



1 



<J^iA) 



(58) 



Applied to the effective channel matrix M it follows that 



1 



a,(M) 



(59) 



where Mi contains the first pT columns {p ~ ut — q) of 
M, assuming that ai{M) < ai{Mi) as will be shown for 
i = 1, . . . ,qT later. In order to lower bound ai{Mi) note 
that 

Mi=e{lT^H)G\p, 
where G\p denotes the first pT columns of G and 
MfMi = e^GfpilT <» H^H)Gip . 



As 



H^H h<Jg+iiH^H)UpUf 



where Up denotes the matrix containing the p singular vectors 
corresponding to the p largest singular values, and where A ^ 
B denotes that A — J5 is positive semi-definite, it follows that 

MfMi h 9^ag+iiH^H)Gfp{lT ® UpUf)Gip . 

Considering the smallest singular value of M^Mi yields 

ai{MfM,) > e''ag+i{H''H)ai{Gfp{lT <E> UpUf)G\p) 
and concequently 

(Ti(Mi) > ueaq+i{H) = M6ip"^'^'+i >p3-ir-^^, (60) 



where the first inequality follows by (|56b| i and the last in- 
equality follows by ( |56a| i together with 9 ^ p2^'— and as 
w > is fixed (independent of p). Further, 

cT.(M) = 0(7, ((/t ® H)G) < eT<j,^^i){H) 

where r = o'max(G) = (Tk(G), and it follows from (15 6 al l that 

cr,(M)<p5-ir-5<r(0+* (61) 

for i ~ 1, • ■ • , qT. As a* > for i = 1, . . . , g, it follows 
by comparing and dSB that ai{M) < ai{Mi) for 
i = 1 , . . . , qT, given that 6 is sufficiently small and that p 
is sufficiently large, making Lemma Inapplicable for k = qT. 
For the maximal singular value of M we have (cf. (I6OI 1) 



<J.iM)<p-^ 



2""T < p2 



where the last inequality follows as q;„^ > 0. Combined with 
(I6OI 1 it follows that 



ai(Mi) 



1 



< 



i5 
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and from ( |59] l and (ISTT i that 

for i = 1, . . . , qT. Consequently (cf. dZTll). 



2^ 



(62) 



1 



> 



V2ka,{Rk) 
given that 5 is sufficiently small. Further, as 

rT 1 ^ 1 _ 1 / r 

2^ + 2"'^(*) " 2 ~ 2 

for i = 1, . . . , fc where k = 2gT by the condition for q in ( |55] ), 
it follows that the lower bound in (|62] | tends to infinity with 
increasing p provided that 6 is small, and we may conclude 
that 

n 2 



2e 



(63) 



/2fca,(i?fc) 

where the last inequality holds, again, provided S is small. 

Combining dZTj ) in Lemma [1] and ( |63T l, and making the real 
valued expansion as we did for Theorem |2] yields a lower 
bound on the number of nodes visited by the sphere decoder 
in layer k — qT given by 

k 



(64) 



where 



i=l 



rT 

K 



r 



(65) 



By noting that 



r r 
0< — + a* ~1< — 



for i = 1 , . . . , q by the assumption that a* < and the 
definition of q, it follows that 



a* — 1 1 = T min | — 



1 



« ■ , — (66) 
riT/ 



a ■ - 1< 



for i — 1, . . . , q. Further, as 

r 

for i > q, also by the definition of q, the right hand side of 
is equal to for i > q. Thus, it follows that 



T min ( 1 



c(r) 



where the last equality follows due to the optimality of a* in 
(l45T l. and we then obtain from (|64] | that 

given that p is sufficiently large and that (5 > is small. 
However, as (5 > can be chosen arbitrarily small it is 
concluded that ( |56] l represents sufficient conditions under 
which the number of nodes visited is arbitrarily close to the 
upper bound of p^^''^ given by Theorem |2] 



C. Probabilities 

We now turn to the probability that the conditions imposed 
by (|56] | are simultaneously satisfied. The events in ( |56] | are 
independent As ( |56] | imply N > p'^^^'^^i'^^ it follows that 



P (iV > pe('')-3fe5^ > ]J P (f^i 



given that p is sufficiently large. 

The assumption made in Lemma |2] i.e., condition ( l48T l, 
guarantees that 

ai((/®[/H)G|p) >0 

for some Up. However, by the continuity of singular values 
^\ it follows for sufficiently small u > (cf. ( |56bt ) that 
P {^2) > 0, which implies P (il2) = P*^ as ^2 is independent 
of p. The same is true for fl^, i.e., P (fis) = p'^. It may also 
be shown that P converges to a strictly positive limil*^ 
and that therefore P (f24) = p°. It follows that 

P{N > p^W-3fc5)>p(f^^) 

The probability of ili may again be assessed by using large 
deviation techniques as in Q. In particular, it is noted that the 
condition imposed by fii (cf. ( I56ab ) specifies an open set of 
admissible a. Applying ( [38] ) and ( |40] i yields 

_ l"gP(^i) < y - nx + 2z - - 2J) 

p->oo log p ^—^ 

< d{r) - 2(nR - nx + q)q5 < d{r) , 

(67) 

where the second inequality follows from ( |45b) and the 
feasibility of a*. Thus, 



^.^logP(A.>p^(-)-3^^)^^^^^ 

p^oo log p 



(68) 



By the definition of the SD complexity exponent c(r) (cf. 
( ITIT l) it follows by (|68]) that c(r) > c(r) - 3fc(5. As the bound 
holds for arbitrarily small (5 > 0, it follows that c(r) — c{r), 
establishing the tightness of ( |45l l and Lemma |2] 

D. The extension to adaptive radius updates 

The derivations above make the assumption that the search 
radius ^ is a non-random function of p that satisfies <^ = 
It is thus natural to ask if the SD complexity exponent could 
potentially be improved by choosing ^ adaptively based on 
the problem data H and Y , as is done when using, e.g., 
the Schnorr-Euchner SD algorithm implementation |2|, f3l. 
However, we will show here that it can not, and therefore that 
the assumption of a non-adaptive radius is made without loss 
of generality. The argument is similar to the one in fTOl. 

"The independence of f2i and 0,2 follows by the i.i.d. Rayleigh assumption 
on H, which make the singular values and singular vectors of H^H 
independent 1381 . 

'^This is provided that r > in which case the the subset of the 
constellation defined by contains an asymptotically deterministic and 
strictly positive fraction of the full constellation, cf. the proof of Lemma 
1 in (8). When r = the statement that c(r) is tight is trivial as c(0) = 0. 
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To this end, we note that even if ^ is adaptively chosen, 
it cannot be chosen smaller than the distance to the closest 
codeword, i.e., the (square root of the) minimum metric in 
( fTST l, because otherwise no codeword will be chosen by the 
search. As was already argued in Section IIII-BI the distance to 
the transmitted codeword is ||(5^i(;|| and P(||(5^i(;|| > e) can 
be made arbitrarily close to one by appropriately choosing 
e > 0. Consequently, whenever the transmitted codeword s 
is the ML decision, i.e., yields the minimum metric in ( fT3T l. 
we could use ^ > e where e = p° as an (arbitrarily likely) 
probabilistic lower bound on an adaptive search radius in 
the proof of the lower bounds on the complexity (e.g., in (|63)) 
and obtain the same SNR exponents in these bounds. 

Thus, to complete the argument we must only rule out 
the possibility that the probability that s yields the minimum 
metric in (fTsT i under the conditions of Vl imposed in ( |56] | is 
small, as the lower bound is derived explicitly under Vt. To 
this end, assume that it is false, i.e., that P(sml 7^ s\Vt) > e 
for any (fixed and SNR independent) e > 0. In this case we 
could lower bound the error probability of the ML decoder 
according to (cf. (|67] |) 

P(sML ^s)> p(sML 7^ s\n)p {n) > eP {n) > p-'^w , 

which would violate the definition of d{r) as the diversity 
order of the ML decoder Consequently, at sufficiently high 
SNR it must hold that P(sml = s|r2) > 1 - e for any e > 0. 
We can thus choose e > such that, under fl, it follows that 
•sml = s and ||(5^i(;|| > e with arbitrary high probability, 
implying that ^ml > e where is the minimum metric in 
( fTsT l. In other words, there is some e > for which 



P(r!U{eML > e})>p 



-d{r) 



Completing the proof of Lemma|2]with e p*^ in place of ^ (as 
£, > Cml > e throughout the search) proves that c(r) = c(r) 
under (|48] | also if we allow for SD implementations that 
adaptively choose and update the search radius Finally, it 
should be noted that what is shown here is not that adaptively 
choosing the search radius cannot reduce complexity - it does 
- but only that this reduction is not significant enough to 
reduce the complexity exponent. 

Appendix C 
Proof of Lemma 3 

Consider the matrix. A, given by 

where S — Diag(0, . . . , 0, ai+i{A), . . . , (T„(A)), and where 
U and V denote the right and left singular vector of A 
respectively. Partition A e C™x" according to 

A=[A, A,] 

where A^ € £m.xn-k ^j^^j ^ (^mxk_ 

By the nature of the QR decomposition, it holds that 



where 11^ denotes the projection onto the orthogonal com- 
plement of the range of Ai (i.e. the nullspace of A}^). 
Additionally, let 

As Q2 is a unitary matrix it follows that 

a,{R22) ^ o,{P) . 

In what follows, we will consider the singular values of P in 
order to establish the lemma. To this end, we will make use 
of two results due to Weyl and Stewart. For a modem proof 
of TheoremlH see e.g. Il22l Corollary 7.3.8]. The statement in 
Theorem |9] follows by combining Il39l Theorem 2.3] and |39] 
Theorem 2.4]. In the following, = crinax(-B) denotes the 
spectral matrix norm. 

Theorem 8 (Weyl): For arbitrary S, C G Cp^« it holds that 

\ai{B)-a,{C)\<\\B-C\\. (69) 

Theorem 9 (Stewart): ¥ox B,C E U""i such that 
rank(B) = rank(C), 

< mindlStll , lic-t 



n 



n 



\B-C\\ 



B ^^C 

where denotes the Moore-Penrose pseudo inverse 1221 . 
By noting that 



(70) 



\Ai-Ai\\ < \\A-A\\=MA), 



(71) 



for I = 1,2 and using the assumption that cri(Ai) > ai{A) 
it follows by Theorem |8] that 

CTiUi) > <^iiAi) ~ \\Ai - A,\\ > ai{Ai) - a,{A) >0 

implying that Ai is full rank. As ai{Ai) > is directly 
implied by cti(Ai) > ai{A) it follows that rank(A]^) = 
rank(y4i) which makes Theorem |9] applicable to Ilj^^ T^a ■ 

As ~' 



A, 



= (ni^ - ni^ )A, + ni^ (a^ - a,) 



it follows that 

\\P-P\\ 



< \\A\\\\\A, - AMA2\ 



where we used Theorem |9] and the fact that ||-BC|| < 
||B||||C|| and ||n^|| < 1 [22]. By noting that = 
1/cti(Ai), that IIA2II < ||A|| = cr„(A), that ||Ai - A^H < 
ai{A) and that ||A2 - < (7; (A) (cf. (|2B), it follows that 



a,iA) >\\P-P\\. 



(72) 



By again applying Theorem |8] to (l72T i it follows that <Ji{P) < 
"■j {P^ + M- Note however that 



rank(A) = rank(AJ + rank(P) 

where P = Il^ A^ e C™^*^. As rank(Ai) = 
rank(A) < n — i it follows that 



k and 



and <Ji{P) — 0. Thus, ai{R22 
lemma. 



rank(P) < k — i 

'^i{P) ^ M establishing the 
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Appendix D 
Proof of Theorem 7 

Let X_ be the un-normalized extended codebook correspond- 
ing to the un-normaHzed lattice points GSoo, i e., where 
X C 9X_. A space-time code is said to satisfy the non- 
vanishing determinant (NVD) condition if li40J 



inf |X|>0, 



(73) 



i.e., if there are no non-zero un-normaUzed (difference) code- 
words with arbitrarily small determinants. The proof of the 
theorem draws from the well known fact that the NVD 
condition is a necessary condition for achieving approximate 
universality, and divides the problem into a few (exhaustive) 
cases where either the condition in Lemma|2]is shown to hold, 
or the NVD property is shown to be violated, thus eliminating 
the possibility of NVD codes that would violate the rank 
condition in Lemma |2] To this end, consider a partitioning 
of the 4x4 generator matrix according to 



Gil 
G21 



G12 
G22 



where Gij G 



2x2 pjj-gj. ^jj^ jgj jjQjg jjj^j. jjjg ^^gg where 
2 is trivially satisfied as G is full rank, i.e., the matrix 

{I2 



is full rank for any unitary U2 G C^^^. We can thus restrict 
attention to the case of p = 1 and consider the rank of 



(/2(»m")G|i = 



m"G21 



■^2x2 



(74) 



where u = Ui E C^^^. In the cases where the NVD property 
is shown to not hold, it is sufficient to consider non-zero 
(unnormalized) codewords of the form 

^ _ Gil G12 si 

~ [G21 G22J [•§2 

where Si £ S^, and Si 7^ 0, and where S2 — 0. The 
(un-normalized) codewords in matrix form are in this case 
given by X = [GuSi G21S1] and we have tt^X = 
[u^GiiSi tt^G2iSi]. All codewords discussed in what fol- 
lows are assumed to have this structure. 

We will now consider different cases depending on the rank 
of Gil and G21. However, as it is straightforward to see that 
X has zero determinant (for any Si) if either Gn = or 
G21 = 0, the cases that need consideration are those when 
the rank of both Gn and G21 is equal to one (case a), when 
the rank of both Gn and G21 is equal to two (case b), and 
when the rank or either Gn or G21 is equal to one and the 
rank of the other is equal to two (case c). 

A. Case a 

Consider the case where the rank of both Gn and G21 is 
one, i.e., where Gn = biaY and G21 = b2a^- If ai and 02 
are not linearly dependent, the condition in (l74l is satisfied 
for any u such that u^bi ^ and u^b2 7^ 0, and we can 
thus restrict attention to the case where ai and 02 are linearly 
dependent. Here, we may without loss of generality assume 



that Oi = 02 = a by absorbing any complex scalars into 61 
and 62- 

Note however that given any e > we can always find a 
point Si e where Si ^ such tha{3 ||a^Si||^ < e. For 
any such Si it follows that (cf. [22 Theorem 7.3.10]) 



max 



< i\\bi 



i.e., the maximal singular value of X can be made arbitrarily 
small. However, this violates the assumed NVD property of 
the code as a small maximal singular value implies a small 
determinant, and concludes case a. 

B. Case b 

When Gil and G21 are full rank we can always find a vec- 
tor u such that it^Gn and m^G2i are linearly independent 
(thus satisfying the condition of Lemma |2]i unless Gn and 
G21 are linearly dependent, i.e., when Gn = aG2i for some 
a e C. However, in this case we have that GuSi = aG2iSi 
for any Si which implies that the columns of X are linearly 
dependent, and the rank of X is zero. This concludes case b. 

C. Case c 

In this case we may assume that Gn = biof has rank 
one and G21 has rank two (the opposite case is handled 
equivalently). Here, as both the set of u for which u^bi = 
and where tt^G2i is linearly dependent of have zero 
measure, we may pick u such that u^bi 7^ and such 
that M^G2i is linearly independent of of, thus satisfying 
the conditions of Lemma |2] This concludes the proof of 
Theorem [T] 
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